Structural similarity for document image classification and retrieval

نویسندگان

  • Jayant Kumar
  • Peng Ye
  • David S. Doermann
چکیده

This paper presents a novel approach to defining document image structural similarity for the applications of classification and retrieval. We first build a codebook of SURF descriptors extracted from a set of representative training images. We then encode each document and model the spatial relationships between them by recursively partitioning the image and computing histograms of codewords in each partition. A random forest classifier is trained with the resulting features, and used for classification and retrieval. We demonstrate the effectiveness of our approach on table and tax form retrieval, and show that the proposed method outperforms previous approaches even when the training data is limited.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

Measuring Structural Similarity of Document Pages for Searching Document Image Databases

Current document management and database systems provide text search and retrieval capabilities, but generally lack the ability to utilize the documents’ logical and physical structures. This paper describes a general system for document image retrieval that is able to make use of document structure. It discusses the use of structural similarity for retrieval; it defines a measure of structural...

متن کامل

Document Image Retrieval Based on Layout Structural Similarity

In this paper, we describe issues related to the measurement of structural similarity between document images. We define structural similarity, and discuss the benefits of using it as a complement to content similarity for querying document image databases. We present an approach to computing a geometrically invariant structural similarity, and use this measure to search document image database...

متن کامل

Learning Document Image Features With SqueezeNet Convolutional Neural Network

The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 43  شماره 

صفحات  -

تاریخ انتشار 2014